28 research outputs found

    Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems

    Full text link
    This paper introduces a methodology to develop energy models for the design space exploration of embedded many-core systems. The design process of such systems can benefit from sophisticated models. Software and hardware can be specifically optimized based on comprehensive knowledge about application scenario and hardware behavior. The contribution of our work is an automated framework to estimate the energy consumption at an arbitrary abstraction level without the need to provide further information about the system. We validated our framework with the configurable many-core system CoreVA-MPSoC. Compared to a simulation of the CoreVA-MPSoC on gate level in a 28nm FD-SOI standard cell technology, our framework shows an average estimation error of about 4%.Comment: Presented at HIP3ES, 201

    System-Level Analysis of Network Interfaces for Hierarchical MPSoCs

    Get PDF
    Ax J, Sievers G, Flasskamp M, Kelly W, Jungeblut T, Porrmann M. System-Level Analysis of Network Interfaces for Hierarchical MPSoCs. In: Proceedings of the 8th International Workshop on Network on Chip Architectures (NoCArc). New York, NY, USA: ACM; 2015: 3-8.Network Interfaces (NIs) are used in Multiprocessor System-on-Chips (MPSoCs) to connect CPUs to a packet switched Network-on-Chip. In this work we introduce a new NI architecture for our hierarchical CoreVA-MPSoC. The CoreVA-MPSoC targets streaming applications in embedded systems. The main contribution of this paper is a system-level analysis of different NI configurations, considering both software and hardware costs for NoC communication. Different configurations of the NI are compared using a benchmark suite of 10 streaming applications. The best performing NI configuration shows an average speedup of 20 for a CoreVA-MPSoC with 32 CPUs compared to a single CPU. Furthermore, we present physical implementation results using a 28 nm FD-SOI standard cell technology. A hierarchical MPSoC with 8 CPU clusters and 4 CPUs in each cluster running at 800 MHz requires an area of 4.56 mm²

    An Abstract Model for Performance Estimation of the Embedded Multiprocessor CoreVA-MPSoC Technical Report (v1.0)

    Get PDF
    Ax J, Flasskamp M, Sievers G, Klarhorst C, Jungeblut T, Kelly W. An Abstract Model for Performance Estimation of the Embedded Multiprocessor CoreVA-MPSoC Technical Report (v1.0).; 2015

    Performance Estimation of Streaming Applications for Hierarchical MPSoCs

    Get PDF
    Flasskamp M, Sievers G, Ax J, et al. Performance Estimation of Streaming Applications for Hierarchical MPSoCs. In: Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO). New York, NY: ACM Press; 2016: 1

    CoreVA-MPSoC: A Many-core Architecture with Tightly Coupled Shared and Local Data Memories

    Get PDF
    Ax J, Sievers G, Daberkow J, et al. CoreVA-MPSoC: A Many-core Architecture with Tightly Coupled Shared and Local Data Memories. IEEE Transactions on Parallel and Distributed Systems. 2018;29(5):1030-1043

    System-level analysis of network interfaces for hierarchical MPSoCs

    Get PDF
    Network Interfaces (NIs) are used in Multiprocessor System-on-Chips (MPSoCs) to connect CPUs to a packet switched Network-on-Chip. In this work we introduce a new NI architecture for our hierarchical CoreVA-MPSoC. The CoreVA-MPSoC targets streaming applications in embedded systems. The main contribution of this paper is a system-level analysis of different NI configurations, considering both software and hardware costs for NoC communication. Different configurations of the NI are compared using a benchmark suite of 10 streaming applications. The best performing NI configuration shows an average speedup of 20 for a CoreVA-MPSoC with 32 CPUs compared to a single CPU. Furthermore, we present physical implementation results using a 28 nm FD-SOI standard cell technology. A hierarchical MPSoC with 8 CPU clusters and 4 CPUs in each cluster running at 800MHz requires an area of 4.56mm2

    Evaluation of interconnect fabrics for an embedded MPSoC in 28 nm FD-SOI

    Get PDF
    Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVAMPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application

    Development of energy models for design space exploration of embedded many-core systems

    No full text
    This paper introduces a methodology to develop energy models for the design space exploration of embedded many-core systems. The design process of such systems can benefit from sophisticated models. Software and hardware can be specifically optimized based on comprehensive knowledge about application scenario and hardware behavior. The contribution of our work is an automated framework to estimate the energy consumption at an arbitrary abstraction level without the need to provide further information about the system. We validated our framework with the configurable many-core system CoreVA-MPSoC. Compared to a simulation of the CoreVA-MPSoC on gate level in a 28nm FD-SOI standard cell technology, our framework shows an average estimation error of about 4%

    Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems

    No full text
    Klarhorst C, Flasskamp M, Ax J, et al. Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems. Presented at the 6th International Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES 2018), Manchester, United Kingdom

    Scalable mapping of streaming applications onto MPSoCs using optimistic mixed integer linear programming

    No full text
    Embedded streaming applications are facing increasingly demanding performance requirements in terms of throughput. A common mechanism for providing high compute power with a low energy budget is to use a very large number of low-power cores, often in the form of a Massively Parallel System on Chip (MPSoC). The challenge with programming such massively parallel systems is deciding how to optimally map the computation to individual cores for maximizing throughput. In this work we present an automatic parallelizing compiler for the StreamIt programming language that efficiently and effectively maps computation to individual cores. The compiler must be both effective, meaning that it does a good job of optimizing for throughput; but also efficient, in that the time taken to find such a mapping must scale well as the number of cores and size of the Stream program increases. We improve on previous work that used Integer Linear Programming (ILP) to map StreamIT programs to multicore systems by formulating the mapping problem in a different way using mostly real rather than integer variables. Using so called Mixed Integer Linear Programming (MILP) dramatically reduces the cost compared to standard ILP. This alternative formulation creates what we call an optimistic solution that we then need to adjust slightly to obtain a final feasible solution. We show that this new approach is always close, if not better in terms of effectiveness, while being dramatically better in terms of scalability and efficiency
    corecore